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Abstract 

This paper discusses asymptotic theory for penalized spline estimators in generalized 
additive models. The purpose of this paper is to establish the asymptotic bias and variance 
as well as the asymptotic normality of the penalized spline estimators proposed by Marx and 
Eilers (1998). Furthermore, the asymptotics for the penalized quasi likelihood fit in mixed 
models are also discussed. 

Keywords Asymptotic normality, i?-spline, Generalized additive model, Mixed model, Pe- 
nalized spline. 

1 Introduction 

The generalized additive model(GAM) is a typical regression model, in which the relationship 
between the one-dimensional response Y and the multidimensional explanatory x = (x±, ■ ■ ■ ,%d) 
is modeled by a link function g(-), as follows: 

g(E[Y\X = x\) = n(x) = 771 On) + • • • + n D {x D ), 

where each rjj(j = 1, • • • , D) is a univariate regression function. If Y has a Gaussian distribution, 
then g is the identity function and, hence, n{x) = E\Y\X = x\. Additionally, the GAM can 
specify a distribution such as a Bernoulli, Poisson or Gamma distribution. In GAMs, the purpose 
is often to estimate n. The parametric and the nonparametric estimation techniques of n have 
been established by several authors (see Hastie et al. (1990) and Wood (2006)). In this paper, 
n is estimated via the penalized spline method. Penalized splines were introduced by O'Sullivan 
(1986) and Eilers and Marx (1996), and are recognized as an efficient technique for GAMs. 
Applications and theories of penalized splines in GAMs have been widely discussed, including 
by Aerts et al. (2002) and Ruppert et al. (2003). 

To construct the estimator of ?7j's, a repetition update method, the so-called backfitting 
algorithm, is often used. However, when the response has a non-Gaussian distribution, such as 
a Bernoulli or Poisson distribution, the overall estimation procedure becomes complicated and 
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its computation time grows, because the estimators are obtained by using a blend of backfitting 
and the Fisher-scoring algorithm. On the other hand, Marx and Eilers (1998) proposed a new 
penalized spline estimator without backfitting algorithms, which is denoted as the ridge corrected 
penalized spline estimator (RCPS). We will briefly describe the RCPS as we will focus on it later. 
The penalized spline estimator is obtained based on maximization of the penalized log-likelihood. 
However, it appears difficult to obtain the maximizer of the penalized log-likelihood i as the 
Hessian of i is not invertible. The RCPS is constructed based on the maximization of £j, which 
is i plus an additional ridge penalty. Since the Hessian of £ 7 is invertible, the maximizer of £ 7 
can be obtained via the Fisher-scoring algorithm. Thus, it is easy to construct the RCPS. 

In univariate models(Z) = 1), Hall and Opsomer (2005), Claeskens et al. (2009), Kauermann 
et al. (2009) and Wang et al. (2011) researched the asymptotic properties of penalized spline 
estimators. Recently, Yoshida and Naito (2012) worked on the asymptotic distribution of pe- 
nalized splines in an additive model. In contrast, Horowitz and Mammen (2004), Linton (2000) 
and Yu et al. (2008) studied the asymptotics for the kernel estimator in a GAM. However, 
the asymptotic results for penalized spline estimators in GAMs have not yet been sufficiently 
developed like they have been for their applications. 

In this paper, the asymptotics for penalized splines in a GAM are discussed. Our main 
purpose is to establish the asymptotic normality of the RCPS. Kauermann et al. (2009) showed 
the asymptotic normality of the penalized spline estimator in generalized linear models(GLM). 
Hence, the results in this paper generalize the results of Kauermann et al. (2009). Furthermore, 
penalized spline smoothing can be linked to mixed models(see Lin and Zhang (1999) and Ruppert 
et al. (2003)). In generalized additive mixed models (GAMM), the penalized quasi likelihood 
method (PQL) is an efficient method for obtaining the estimator and predictor. We also show 
the asymptotic normality of the PQL fit. 

This paper is organized as follows. In Section 2, the GAM is defined and the RCPS is 
constructed by the Fisher-scoring algorithm. Section 3 shows the asymptotic normality of the 
RCPS. Section 4 states the asymptotics for the PQL fits in a GAMM. In Section 5, the appli- 
cations of the approximate confidence interval are addressed. Section 6 provides a numerical 
study to validate the asymptotic normality of the RCPS. Related discussions and issues for 
future research are addressed in Section 7, and proofs for theoretical results are given in the 
Appendix. 

2 Penalized spline estimator in a GAM 
2.1 Generalized additive spline model 

For the dataset {(j/i, Xj)\i = 1, • • • , n}, consider an exponential family of the generalized additive 
model with a canonical link function 

f(yi\Xi,rj) = exp I + h(yi, cp) I , i = l,---,n, (1) 
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where xi = (xn, ■ ■ ■ ,Xio) is a D-variate explanatory variable, rj{-) is an unknown natural pa- 
rameter which has the additive formation 

T](x) = r/i(xi) H h 7] D (xd), 

where rjj's is an unknown univariate function, (ft is a dispersion parameter, and c and h are 
known functions. The canonical link function indicates g^ 1 = d, which leads to -Ef^l-X^ = 
Xi] = g^ 1 (rj(xi)) = c'(r](xi)) and V[5^|Jtj = xi\ = (ftd'(rj(xi)), where d and d 1 are the first and 
second derivatives of c. More general settings concerned with the link function were clarified 
by McCullagh and Nelder (1989). In this paper, for the natural parameter 97(05), we assume 
that E[rjj(Xj)] = 0(j = 1, • • • , D) to ensure the identifiability of rjj. For simplicity, we hereafter 
ignore the role of the dispersion parameters in ([1]) and set (ft = 1, thus denoting h(y, (ft) = h(y). 

Our purpose is to estimate r\j using nonparametric spline methods. We now prepare the 
5-spline model 

Kn 

s(x)= £ Bf{x)b Kj , j = l,--,D 

k=-p+l 

as an approximation to rjj(x), where B^\x){k = —p + 1, • • • , K n ) are the pth .B-spline functions 
defined recursively as 

B[%) = 



1, K k -l < X < K k , 

0, otherwise, 



™k+p—l rtfe— 1 ™k+p rvk 

where K k (k = —p + 1, • • • , K n + p) are knots and b k j's is an unknown parameter. Some funda- 
mental properties of £>-splines were detailed by de Boor (2001). We will write B^\x) = B k (x) 
unless we specify the degree of -B-splines, because we will mainly focus on the pth 5-spline from 
now on. The suggested density of Yi is defined as 

f(yi\xi, b) = exp (yiZ(xi) T b - c(Z(xi) T b) + h(yi)) , 

where Z(x) = (B(x 1 f • • • B(x D f f, B{x) = (B_ p+1 (x) ■ ■ ■ B Kn (x)) T ', b = (bj ■ ■ ■ b T D ) T and 
bj = • • • bK n ,j) T ■ Using the estimator bj = (b- p +xj ■ ■ ■ bK n j) T of bj, the estimator of 

r)j(xj) can be defined as 

Vj(xj)= ^2 B k( x j)h,j, j = V"-, D . 
fe=-p+i 

2.2 The penalized spline estimator 

To estimate b, we prepare the log-likelihood 

1 n 1 1 

- V log f( yi \ Xi , b) = -{y T (Zb) - l T c(Zb)} + -l T h(y), 
n i f— ' n n 
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where y = (yi ■■■ y n ) T , c(Zb) = (c{Z(xi) T b) ■■■ c(Z(x n ) T b)) T , Z k = (B^ p+j (x ik ))ij, Z = 
\Z\ ■ ■ ■ Zd], and h(y) = (h(yi), • • • , h(y n )) T . It is known that the spline estimator obtained by 
maximization of the log-likelihood tends to display 'wiggle behavior'. Hence, we consider using 
the penalized spline estimator to obtain a smooth curve. Define the penalized log-likelihood as 
follows 

n D » 

£(b, A n ) = -^log/^K^-^^&jA^A^ 

n i=i j=i n 

= -{y T (Zb)-l T c(Zb)} + -l T h(y)-^b T Q m (X n )b, (2) 
n n An 

where Xj n is the smoothing parameter (j = 1, • • • , D), the (K n + p — m) x (K n + p)th matrix 
A m is the mth difference matrix, which is given by Marx and Eilers (1998) and Q m (A n ) = 
diag[Ai n A^A m ••• A£) n A^A m ]. In general, the maximizer of ([2]) is obtained by the Fisher- 
scoring algorithm. As in the typical problem of spline methods in a GAM, however, the Hessian 
of £(b, X n ) is not invertible and so the Fisher-scoring method is not usable directly. To overcome 
this singularity problem, we can use backfitting algorithms (see Hastie and Tibshirani (1990)). 
However, the overall algorithm becomes complicated and the computation grows (see Section 
1). These problems were discussed by Marx and Eilers (1998). We will next review the ridge 
corrected penalized spline estimator. 

2.3 The ridge corrected penalized spline estimator 

Marx and Eilers (1998) proposed a nice estimation method for b without using a backfitting 
algorithm for penalized splines in the GAM context. They defined the ridge corrected penalized 
log-likelihood as 

£(b,X n , ln )=£(b,X n )-^-b T b, (3) 

where j n > 0. Let b = (b 1 ■ ■ ■ b D ) T be the maximizer of (|3|), which can be obtained directly via 
the Fisher-scoring method since the Hessian of £(b, X n , is invertible. The gradient G(b, X n , j n ) 
and Hessian H(b, A n ,7„) of £(b, A n ,7„) are obtained with 

G(6,An,7n) = ^^^ = -{Z T y-Z T c'(Zb)}--Q m (X n )b-^b, 

ob n n n 

#(Mn,7n) = dHi \:tlT ln) = --Z T di ag [c"(Zb)]Z - -Q m (X n ) - ^J, 
dbdb 1 n n n 

where c'(Zb) and c"(Zb) are n-vectors defined in the same manner as c(Zb). The fe-step iterated 
estimator of 6 can be written as 



&W = (Z T W (k -^Z + Q m (X n ) + ln I)- 1 Z T W^ [Zb«-V + (W^)- 1 {y - c'(Zb^)}} , 

where W^ 1 ^ = diagfc"^^ -1 ))]. As k -> oo, & (/c) converges to b if the initial b^ is appropri- 
ately chosen. The RCPS of f]j(xj) can be obtained as fjj(xj) = B(xj) T bj. In the next section, 
we discuss the asymptotic properties of [771(37) • • • fj£>(xD)] T ■ 
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3 Asymptotic theory 

Here, we list some assumptions regarding the asymptotics of the penalized spline estimator. 



Assumptions 

1. The explanatory X = (X±, ■ ■ ■ ,Xjj) is distributed as P(x) on [0, l] D , where [0, 1] D is the 
D-variate unit cube. 

2. For j = 1, • • • , D, r]j G C p+1 and c G C 3 . 

3. The knots for the .B-spline basis are equidistantly located with n k = k/K n (k = —p + 
1, • • • , K n + p) and the number of knots satisfies K n = o(n 1 / 2 ). 

4. For the non-singularity of H(b, A n ,7„), K n is chosen such that D(K n + p) < n. 

5. The smoothing parameters Xj n (j = 1, • • • , -D) are positive sequences such that \J n is larger 
than the maximum eig envalue of (ZjZ j )~ 1 / 2 A^A m (ZjZ i )- 1 /2. 

6. Lastly, 7„ = o(X n K~ m ), where A„ = ma.Xj{X jn }. 

For a random variable U n , E[U n \X n ] and V[C/ n |X n ] denote the conditional expectation and 
variance of U n given (Xi, • • • , X n ) = (xi, • • • , a; n ), respectively. Define the (K n + p)th square 
matrix Gt = (Gk,ij)ij, where the (i,j)-th component is 



G k ,ij = / c"(rj(x))B_ p+i (x k )B- p+j (x k )dP(x). 
J\o,i] D 



Using this, we get Tj(X jn ) = (Gj + (X jn /n)A' m A m ). Let 



{bio ■ ■ ■ b Dof = argmin J - E 
b I n ,; = i 



log 



/(li|aJi,6) 



X n 



(4) 



and let t^oC^j) = B(xj) T b j0 (j = !,■■■ ,D). 

The asymptotic bias of fjj (xj ) can be written as 

E[fjj(xj)\X n ] - rjj(xj) = E[rjj(xj)\X n ] - Vjo{xj) + Vjofa) - Vj(xj)- 

In the following Proposition [[J the difference Tjjo(xj) — rjj( x j) 1S asymptotically evaluated. The 
asymptotics for fjj(xj) — T]jo(xj) = B(xj) T (bj — bjo) can be shown in the following Theorem [1] 
by using the Taylor expansion of G(b, \ n ,ln) around 6o (see Lemma H] in the Appendix), the 
properties of a partitioned matrix of H(b, A n ,7„) and its asymptotic results. 

Proposition 1. Under the Assumptions, for j = 1, ■ ■ ■ ,D, 

Vjo(xj) ~ Vj(xj) = bj, a (xj) + o(K- (p+1) ), 
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where 

bUx) = - K r\ P +i)ih x p+1 w^w ' 

I(a < x < b) is the indicator function of an interval (a,b) and Br p (x) is the pth Bernoulli 
polynomial. 

Theorem 1. Under the Assumptions, for j = 1, • • ■ , D, 

Efaixj^Xn] - Vj ( Xj ) = b jA ( Xj ) + bjjSxj) + o P (K-^) + o P {\ jn K l - m n- 1 ), 

F[%(^)|x n ] = ^(x/r i (A jn )- 1 r J (o)r 3 (A 3n )- 1 B(x i )(i + o P (i)) 

= P (K n /n), 
Cov(fji(xi),fjj(xj)) = P {n~ l ), 

where bj )CL {xj) is given in Proposition^ 

W*) = -^B(x) T T 3 {\ 3n )-^' m ^ m b 3G = O [^^-) ■ 

In Theorem [H the influence of 7 n appears to be of only negligible order. In actuality, we 
can use very small j n as long as H(b, A n ,7 n ) is nonsingular. For example, Marx and Eilers 
(1998) used -y n = 1CP 6 . Thus, it is understood that the influence of 7 n is small theoretically 
and numerically. From Theorem [H the conditional Mean Squared Error (MSE) of f}j{xj) can be 
obtained as follows. 

Corollary 1. Under the same assumption as Theorem^ it follows that 

MSE[%(^)|^n] = E[{Vj(xj) ~ Vj(xj)} 2 \X n ] = Op (k- 2(p+ V + X%K% l -^ n - 2 ) + Op^n" 1 ). 

Furthermore, the rate of convergence of the MSE ofrjj(xj) becomes 0(n~^ 2p+2 ^ <y2p+ ^) by taking 
K n = 0(n 1 /(2p+3)) ; Xjn = 0{n u ), v<{p + m + l)/(2p + 3). 

Compared with the kernel estimator, the asymptotic order of MSE of the RCPS is the same as 
that of the local pth. polynomial estimator when p is odd and the number of knots in the spline 
methods and the bandwidth h n in the kernel methods are connected by K n /h~ l = 0(l)(see 
Opsomer (2000)). Lyapunov's condition of the central limit theorem yields the asymptotic 
normality of [?)i(xi) ••• t}d(xd)] t . 

Theorem 2. Suppose there exists 8 > 2 such that E[\Y{ — c'(n(xi))\ 2+s \ X{ = Xi] < oo. Further- 
more, we assume K n = 0{n l ^ 2p+ ^) and \ n = 0(n u ), v < (p + m + l)/(2p + 3). Then, under 
the Assumptions, for any fixed point x = (x\, • • • , xn) £ (0, 1) D , as n — > oo ; 



n 



Vla( x i) - - Biasi(xi) 

f]Da( x D) ~ Vd(x d ) - Bias D (x D ) 



6 



where for j = 1, • • • , D, Biasj(xj) = bj A (xj) + bj t \(xj), and ^ = diag[^i(xi) • • • iI)d{xd)} with 

^ j {x j )= lim ■^s( a; ,) T r,(A in )- 1 r J (o)r J (A in )- 1 s(^), j = i, ■■■ ,d. 

The proof of Theorem [2] is almost the same as that of Theorem 2 of Yoshida and Naito 
(2012). The asymptotic order of the bias and variance of the RCPS in Theorem [1] allows us to 
satisfy Lyapunov's condition for [771 (xi) • • • rjD(xD)] T '• 

We note that an approximate pointwise confidence interval of rjj(xj) can be constructed by 
using the asymptotic distributional result of fjj(xj). However, the asymptotic bias and variance 
of fjj(xj) contain unknown variables and, hence, these should be estimated. For example, we 
replace bo an d Gj with b and n~~ 1 ZjWZj, respectively, where W = diag[c"(Z6)]. Furthermore, 
as it is the pilot estimator of the (p+l)th derivative of rjj, we can utilize the (p+l)th derivative of 
the RCPS fjj with (p+2) or higher degree splines. Thus, we can construct the estimator Biasj(xj) 
and tpj(xj) of Biasj(xj) and ipj(xj), respectively. Consequently, we obtain an approximate 
confidence interval of r]j(xj) by the following Corollary. 

Corollary 2. Under the same assumption as Theorem^ a 100(1 — ot)% asymptotic confidence 
interval of r] Ax A at any fixed point Xj G (0, 1) is 

Vjfcj) ~ Biasjfo) - z a / 2 \J i>j(xj), fjj(xj) - Biasj(xj) + z a/2 yj 4>j(xj) , 
where z a / 2 is the (1 — a/2)th normal percentile. 

Remark 1 We see from the proof of Theorem Q] that the asymptotic form of fjj (xj ) can be 
written as 

fjj(xj) - r]j(xj) = {B{xj) T T j (\ jn )~ l G : j{bQ,\ n ^ n ) + b j>a (xj)} (1 + o P (l)) (5) 

under the same assumption as Theorem [2j where Gj{bo,\ n ,^ n ) is the jth (K n + p)-subvector 
of G(po, A n ,7 n ). From (2.8) of Kauermann et al. (2009), we see that fjj(xj) and the penalized 
spline estimator based on the dataset {(yi,Xij) : i = 1, • • • , n} in GLM have the same asymptotic 
form. Thus, ([5]) indicates that the asymptotic results of the RCPS in the GAM include those 
in the GLM. Note that in GLM(D = 1), we do not need to use the ridge penalty because the 
Hessian of the penalized log-likelihood of b is strictly convex. 

Remark 2 Claeskens et al. (2009) showed the asymptotic bias and variance of the penalized 
spline estimator in a regression model with D = 1. They studied the asymptotics for penalized 
splines in the following two asymptotic scenarios: (a) the value K q appeared in their paper, less 
than 1, and (b) K q > 1. In our setting, Assumption 5 guarantees case (a) and so Theorem [1] 
can be seen as the general version of Theorem 2 (a) of Claeskens et al. (2009) with respect to 
the model and dimension of covariates. If A^ 1 is equal or smaller than the maximum eigenvalue 
of (Zj Zj)^ 1 A^A m (Zj Zj)^ 1 , the asymptotics for the penalized spline estimator in the GAM 
will be demonstrable, such as in Theorem 2 (b) of Claeskens et al. (2009). 
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Remark 3 From Theorem El it is understood that [f)i(xi) ••• t]d(xd)] T are asymptotically 
mutually independent. Wand (1999) showed the asymptotic independence of the kernel es- 
timator in additive models. Hence the penalized spline estimator and the kernel estimator 
have the same asymptotic property. The asymptotic independence of the joint distribution of 
[771(3:1) ■■■ fjD( x D)] T gives some justification for Corollary [21 in which the approximate con- 
fidence interval is constructed based on the asymptotic result of the marginal distribution of 

fjjiXj). 

Remark 4 Clearly, the penalized spline estimator can also be obtained via the backfitting 
algorithm. The asymptotic normality of the backfitting estimator can be shown, although it is 
not discussed in this paper. In additive models, Yoshida and Naito (2012) derived the asymptotic 
normality of the penalized spline estimator obtained by the backfitting algorithm. 

Remark 5 Theorems in this section have been shown for the RCPS with common (p,K n ,m) 
in each covariate. When we construct f)j(xj) using different (p, K n ,m) in each j, the asymptotic 
normality of the RCPS can also be shown. In other words, for f)j(xj) with (pj,Kj n ,rrij) which 
satisfy (pj, Kj n ,rrij) ^ [p.- L , Ki n ,rrii) (j ^ i), Theorems Q] and [2] hold. 



4 The mixed model representation 



In this section, we discuss the penalized spline estimator in relation to mixed models. We 
consider model ([1]) again with rjj(xj) approximated by a pth truncated spline model: 



K n -1 



^2(3ijx l + u k ,j(x- K k )^_, 



i=Q 



k=l 



where (x)+ = max{x,0}. We assume that the random vector Uj = (uij ••• UK n -i,j) T has 
density uj ~ N(0,ajl) with cr| < oo, and ttj and Uj are independent for i ^ j. Hence, 



u = u 



T 



Up] distributes N(0, £„), where S n = diagfcr^J • • • a D l\. Let 



1 Xlj 



1 X 



x lj 



no 



Ri 



(Xlj - Klf_ 



(.Xlj ~ KK n -l)l 



(x n j KK n -l) P + 



Sj = [Xj Rj], S=[S! ■■■ S D ], = (Ay • • • (3 PJ ) T , 0j = [(3j uj] T and G = [0j 
The suggested joint density of (y, u) can be written as 



e T D ] T . 



f(y,u:(3) = f(y\u:(3)f(u) 



exp [y T (S0) - l T c(S0) + l T h(y)] 
exp [y T (S9) - l T c(S0) + l T h(y)] 



1 



V / (2vr) D |S u 
1 

V(2tt) d |S u 



exp 



exp 



2 u 

1 rp 
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where G = diag[0i • • • Qd] and Qj = diag[O p+ i a - 2 1]. As a convenient method of obtaining 



the estimator of and the predictor ii of u, the PQL is often used. In the PQL context, for 
given o\ , • • • , cr^, (0, ii) is defined as the maximizer of 



£03, u) = - log f(y, u\0) = -{y 1 {SO) - I 1 c(S0)} - ^0 T eO + C(E„), 
n n In 



where C(E U ) is not dependent on /3 and u. Let 5(x) = (1 x ■ ■ ■ x p {x — ki) p + ■ ■ ■ (x — KK n ~i) p + ) T ■ 
Then, the PQL fit of r}j(xj) is defined as fjj,p(xj) = S(xj) T 6j, where 9j = [0j uJ] T ■ 

We show the asymptotic distribution of [r)i,p(xi) ■ ■ ■ f]D,p(x£))] T . I 11 order to achieve the 
asymptotic normality of the PQL fits, we consider the equivalence result between the S-spline 
model and the truncated spline model. By the fundamental property of the .B-spline function, 
there exists a (K n +p)th invertible matrix Lj such that Zj = SjLj. Then we obtain Z = SL 
and SO = Zb where L = diag[Li • • • Lp] and b = L~ 1 0. Furthermore, £(0, u) can be rewritten 
as 



1 K 2p 
£(0,u) = £(b) = -{y T (Zb) - l T c(Zb)} - 



b T Q p+1 {E u )b + C(E u ), 



(6) 



where = diagfc^ 2 Ap +1 A p+ i ••• a D 2 Ap +1 A p+ i\. Here we have used the fact that 

T 90 = b T L T QLb = A' 2p 6 T Q p+ i(£„)b. Claeskens et al. (2009) clarified the equality OjQjOj = 
K 2p bJ Ap +1 A p +ibj. By showing the asymptotic distribution of the maximizer [b 1P ■ ■ ■ bp>,p] T 
of £{b), we obtain the asymptotic normality of [7)1^(21) • • • f)D,p(xD)] T ', where 

Tj jtP (xj) = S(x) T dj = SixfLjLj 1 ^ = Z(x) T b jjP . 

Theorem 3. Suppose there exists 5 > 2 such that E[\Y{ — c' (rj(xi))\ 2+s \ X i = Xj\ < 00 and 
r/i, • • • ,7] D e C p+1 . Under K n = 0(nV(?P+ 3 )) and aj 2 = 0{n u ), v < 2/(2p + 3), for any fixed 
point x G (0, l) D , as n — y 00, 

m,p(xi) ~ Vi( x l) ~ Biasi(xi) 



n 

~K~n 



N D (0,Vp), 



Vd,p(xd) - Vd(xd) ~ Bias£>(xD) 
where Biasj(xj) = bj >a (xj) + bj >a (xj), bj^ a {xj) is that given in Proposition 1, 

K 2p 



W*i) = TT^i) Gj'A p+1 A p+1 b j0 = P (K p /(na 2 )), 

no- 

^P = diag[^i,p(xi) ••• iPd,p(xd)] and 

4>j,p(xj) 

= lim ±B(x j f(G j +K^(na*r 1 AT +1 A p+1 )- 1 G J (G J + K%{na]y^ T p+1 A p+1 )^ B( 

Ti r OO J\ 



Remark 6 If we use m = p + 1, the results of Theorem [2] are asymptotically equivalent to those 



of Theorem [3] by replacing Xj n with K 2p a- 2 (j = 1, • • • , D). 
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Remark 7 It should be noted that the maximum likelihood method or the restricted maximum 
likelihood method can be utilized for estimating = 1, • • • , D) by using pseudo data. These 
methods and estimation algorithm based on the Fisher-scoring algorithm are detailed by Breslow 
and Clayton (1993) and by Ruppert et al. (2003). 



5 Applications 

We apply the approximate confidence interval of each covariate r)j{xj) to real datasets. In all 
examples, (p, m) = (3, 2) is adopted. The number of knots and the smoothing parameters are 
chosen via generalized cross-validation. As a pilot estimator of m (xj) in Bias.,(xj), we utilize 
the 4th derivative of the RCPS with a 5th degree £?-spline model. To see the behavior of f]j[xj), 
the partial residual plots 

fjj(x i:j ) + W~ l {yi - c'(f)(xi))), 

for each Xij(j = 1, ••• ,D) are displayed (see Cook and Dabrera (1998) and Landwehr et al. 
(1984)). 

5.1 Kyphosis data 

The kyphosis data set had 81 rows and 4 columns, representing data of children who have had 
corrective spinal surgery. This data is available in the software R (package rpart). For this 
data, the logistic model 

/ exp[rii(xii) -| + 773(2^3)] \ 

Yi ~ Bernoulli — — — ■ -r , z = 1, • • • ,81 

V 1 + exp [771 [Xii ) H h 773 (xis ) J J 

is assumed, where Yi is a factor with levels absent(O) or present(l) indicating whether a kyphosis 
was present (1) after the operation, xn is the age in months, Xi2 is the number of vertebrae 
involved and x^ is the number of the first (topmost) vertebra operated on. We construct the 
RCPS with 7 n = 10 -6 and the approximate confidence intervals for each rjj(xj). 

In Fig. [H for j = 1, 2, 3, the RCPS fjj(xj), the 99% approximate pointwise confidence interval 



fjj{xj) — BiaSj(xj) — 2.58 y ipj(xj), Vj{ x j) ~ Biasj(xj) + 2.58y ipj(xj) , 
and the partial residual are all illustrated. For comparison, r/j ± 2 x (standard error) 



, 3 = 1,2,3 



are also superimposed. In all covariates, smooth intervals are obtained. Marx and Eilers (1998) 
illustrated the RCPS and r/j ± 2 x (standard error) for the same dataset in Fig. 4 of their paper. 
Our results and theirs are similar. However, our interval is wiggles a bit because the asymptotic 
bias is corrected in each covariate. 
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Figure 1: Plots of the kyphosis data with the RCPS (dashed), the 99% approximate confidence 
interval(solid), fjj db 2 x (standard error) (dot-dashed line) and the partial residuals. The left, 
middle and right panels are for 771(2:1), 772(2:2) and 773(3:3), respectively. 



5.2 Air Pollution and Mortality data 

This data set contained air pollution and mortality data for the city of Milan, Italy, over 3652 
consecutive days (i.e., 10 consecutive years: 1st January, 1980 to 30th December, 1989). The 
original data is available on the web site of Ruppert et al. (2003). The relationship between the 
number of deaths in a day and some explanatory variables is modeled as follows 

Yi ~ Poisson[exp(77i(xa) H h 7/5(2^5))], i = !,-■■ , 102, 

where Y{ is the total number of deaths in a day, xn is the number of days since 31st December, 
1979, Xi2 the mean daily temperature in degrees celcius, 2^3 is the relative humidity, 
measure of sulfur dioxide levels (S02) in ambient air and x^s is the total amount of suspended 
particles (TSP) in ambient air. All of these have been measured on public holidays within 
the 3652 days, giving a sample size of n = 102. We constructed the RCPS of rjj(xj) and the 
99% approximate confidence intervals. In Fig. [21 the RCPS, the 99% approximate confidence 
intervals, fjj db 2 x (standard error) and the partial residual for each Xj are illustrated. We see 
that the effect of Biasj(xj) is somewhat large for all covariates. 



6 Simulation 

In this section, we validate Theorem [2] numerically by simulation. The true natural parameter 
utilized in the simulation is defined as 77(2:) = 77i(xi) + 772(x2)+773(x3), where 771(2:1) = sin(27rxi), 
772(2:2) = 2cos(27TX2) and 773(x3) = sin 2 ((7r/2)x3). The design points (xn, Xi2, 2^3) are created 
by 
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Figure 2: Plots of air pollution and mortality data with the RCPS, the 99% approximate 
confidence interval, fjj =t 2 x (standard error) and the partial residuals. 



where Zij(i = 1, • • • , n, j = 1, 2, 3) are generated independently from U(0, 1), the uniform distri- 
bution on [0,1]. We prepared two types of the design, with (i) p = and (ii) p = 0.2. Then, 
the true functions are corrected to satisfy E[r/j(Xj)] = in each (i) and (ii). The response Y{ is 
generated from 

exp[rfi(xn) + T]2{x i2 ) + 773(^3)] 



Yi ~ Bernoulli 



1,- 



, n. 



(7) 



1 + exp[ryi(xii) + r/ 2 (x i2 ) + m{ x in)] 
Our purpose is to compare the density of -/V(0, 1) and the kernel density estimate of the 
simulated Uj(j = 1,2,3), as well as the density of A^(0, 1%) and the kernel density estimate of 
the simulated [Ui,U2] T , [Ui,U3] T and [U2,Us] T to validate Theorem^ where 

^1(^1) — Biasi(xi) 



Ui 

u 3 



n 

K~n 













- m(x2) - 


Bias2 






^2(^2) 






173(^3) 


- m(%3) - 


Bias3 





^3(^3) 



(8) 



Here, 



&( x i) = TT B ( x r> Tt i(A J n)- 1 f i (0)f i (A J „)- 1 J B(^), 



■jnj 



n 



^(ZjWZj + Aj n A^A m ). For j = 1,2,3, BiaSj(xj) is constructed using the same 
method as that in the previous section. The bandwidth discussed by Sheather and Jones (1991) 
is utilized for kernel density estimates. The simulation algorithm described as follows: 
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Step 


1 


For j = 1,2,3 and i = 1, • • • , n, generate Xij from (i) or (ii). 


Step 


2 


Generate the data {(yi, Xi)\i = 1, • • • , n} from ([7]). 


Step 


3 


Calculate fjj{xj){j = 1,2,3) at a fixed point (xi, X2, £3) = (0.5,0.5,0.5). 


step 


4 


Calculate the values of (|8]). 


Step 


5 


Iterate from Step 2 to Step 4, 10000 times. 


Step 


G 


Draw the kernel density estimate of Ui, U2 and U3 and compare with the density of N(0, 1). 


Step 


7 


Draw the kernel density estimate of [U\ U2] 7 ', [Ui r/3] 71 and [U2 Uz] T , and compare with the 






density of iV 2 (0, J 2 ). 



To construct fjj(%j)(j = 1,2,3), we utilize the cubic S-spline (p = 3) and the second differ- 
ence matrix {rn = 2). Furthermore = 2["ra^/^~|, Ai ra = O.Ia^/ ti/ K n , \2n = 

0.01^n/K n and 

A3n = \/n/K n are used. The ridge parameter is chosen as j n = 10 -4 . The sample sizes are set 
at n = 100 and n = 1000. 

In Fig. El the density estimate of ([5D with (i), and the densities of the normal distribution 
are shown. As the sample size increases, the asymptotic normality of the RCPS in Theorem [2] 
can be observed numerically. We see from (1,1), (2,2) and (3,3) panels that the density estimate 
becomes close to when n = 1000. When n = 100, a large correlation between Ui and Uj 
can be observed. However, as n increases, the correlation becomes small. The results with the 
correlated design (ii) are drawn in Fig. HJ The density estimate of Ui appears to be far from 
N(0, 1), even when n = 1000. However, we also find that [U Uj] T tends to become close to 
NiiOjIi) as n — > 00. We have confirmed that the density estimate with Yi ~ Poisson(r/(a;i)) 
tends to become close to the normal distribution as n increases, though this is not shown in this 
paper. However, the speed of convergence of the density estimate with the Poisson model was 
somewhat slower than with the Bernoulli model. 

7 Discussion 

This paper showed the asymptotic normality of the penalized spline estimator in the GAM. The 
results of this paper generalize Theorem 1 of Kauermann et al. (2009) and Theorem 2 of Yoshida 
and Naito (2012). The main tools used to prove our Theorems were the spline approximation 
theories and the asymptotic properties of the band matrices. By applying their properties, the 
asymptotics for penalized splines in other models can be investigated for further study. 

In spline smoothing, the determination of smoothing parameters is very important. Many 
researchers have addressed this problem by using grid search methods, such as Mallow's C p , 
cross-validation and generalized cross-validation. Since the computation time of a grid search 
is dramatically increased when D > 1, more direct methods would be a useful area of study. 
It may be possible to discuss the selection of smoothing parameters based on the asymptotic 
properties in this paper. 
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Bernoulli, Uncorrelated design, n=100, 1000 



Bernoulli, Uncorrected design, n=1000 



Bernojlli, Uncorrelated design, n=1000 




Bernojlli, Uncorrelated design, n^100 Bernojlli, Uncorrelated design, n=100, 1000 Bernoulli, Uncorrelated design, n=1000 





Figure 3: The density estimate of Ui, [Ui Uj] T and the density of iV(0, 1) and N(0,l2) with 
the Bernoulli model and an uncorrelated design. For i = 1,2,3, the panels are the 

density estimates of Ui for n = lOO(dot-dashed) and n = 1000(dashed), and the density of 
N(0, 1) (solid). The (2,1), (3,1) and (3,2) panels are the density estimates of [Ui U 2 ] T , [Ui U 3 } T 
and [U 2 f/ 3 ] T (dashed) for n = 100 and the density of N(0, 1 2 ) (solid). The (1,2), (1,3) and (2,3) 
panels are the density estimates of [U\ U 2 ] 7 ', \U\ U%] T and [U 2 ?7s] T (dashed) for n = 1000 and 
the density of N(0, /2)(solid). In each panel, the contour lines of N(0, 1 2 ) are the same as that 
of the density estimate. 
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Figure 4: The density estimate of Ui, [U{ Uj] and the density of iV(0, 1) and N(Q, 1%) with the 
Bernoulli model and the correlated design. The description of each panel is the same as in Fig. 

El 
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In recent years, the so-called high-dimensional additive models characterized by "n < D" 
have been studied by many authors such as Meier et al. (2009), Huang et al. (2010) and Fan 
et al. (2011). These previous works are based on unpenalized S-spline estimators. Although 
it is beyond the scope of this paper, the asymptotics for penalized splines in high dimensional 
additive models would be interesting to explore. 

Appendix 

For a matrix X n = (Xj,- n )j,-, if max{n a \Xij n \} = Op(l)(op(l)), then it is written as X n = 

Op(n~ a ll T )(op(n~ a ll T )). When X n is vector, it is written as X n = Op(n~ a l). We define 
W = diag[c"(Z6 )], G hn = n^ZjWoZj, G id , n = n^ZfW^ and G iAn = Gj^ n . In the 
sequel, we use H j>n = G jn + (X jn /n)A' m A m + (>y n /n)I(i,j = 1, • • • ,D,i^j). 
We need 3 additional Lemmas as follows. 

Lemma 1. G jjn , Gi :jtn and Hj >n satisfy G j>n = P (K~ 1 11 T ), G i>j)n = O p (K~ 2 H t ) and HJ^ = 
Op(K n ll T ). Let A = (aij)ij be (K n + p) x (K n + p) matrix. Assume that as K n — > oo, A = 
P (K%11 T ). Then, under the Assumptions, G jn A = P (K%- 1 11 T ), G iyj)n A = P (K%- 2 11 T ) 
and HjnA = P (^+ a ll T ). 

Lemma 2. Let Ar) t n be {D(K n + p)} x {D(K n + p)} matrix. Assume that as K n — > oo, 
Ad,u = Op(K®ll T ). Then, under the Assumptions, Ar), n H(bo,\ n ,^ n )~ l = Op(K® +1 ll T ) 

Lemma 3. Under the assumption, for j = 1, • • • , D, A m bjo = 0(K~ m l). 

Lemma Q] can be proven by the properties of the integral of i?-spline basis and the inverse 
of band matrices detailed in Claeskens et al. (2009) and Yoshida and Naito (2012). Then, 
Assumption 5 of this paper indicates that the case K q < 1 of Claeskens et al. (2009). The 
proof of Lemma [2] is addressed in Yoshida and Naito (2012) by induction for D. Lemma [3] can 
be shown from the derivative property of 5-spline model: s^ m \x) = B^ p ~ m \x) T K™ A m b. The 
above equality and Proposition 1 yield B^ p ^'"^ (x) T K™ A m bjo = {x){l + o(l)). Since the 
asymptotic order of B^ p ~ m \x) T K™ A m bjo and each component of K™A m bjo are the same as 
0(1), Lemma [3] holds. The details are clarified in Section 2 of Claeskens et al. (2009). 

Lemma 4. Under the Assumptions, 

b-b = -H(b ,\ n ^ n )~ 1 G(b ,\ nnn ) + o P ( \( XnK n m ) + ^llY 
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proof of Lemma |4] 

We use the Taylor expansion of G(b, A n ,7„) around &o> giving 

= G(6,A n , 7n ) 

= G(b , A n ,7„) + H(b , A n ,7„)(b - 6 ) 

+{H(b + 0(6 - fe ), A n ,7n) - H(bo, X n ,7n)}(b - bo), 

where = diagfwi • • • ^D(K n +p)\ an d G (0, 1). Therefore, b — bo can be written as 

b-b = {- J H"(6o,A„,7n)}~ 1 G(&o,A„,7n) 

-H(b , A n ,7 n ) _1 {i?(b + 0(6 — 6 ),A n ,7 n ) - H(b , X n ,j n )}(b - b ) 
= {-i7(6 ,A n ,7 n )}~ 1 G(&o,A ri ,7n) 

+ J ff(b ,A n , 7 „)- 1 (^Z T {w(b + n(b-b ))-W(b )]z > j (b-b ), (9) 

where W(6) = di&g[c"(Z(xi) T b)]. Furthermore, for i = 1, • • ■ ,n, the Taylor expansion yields 

c"(z( Xi ) T {b + n(b-b )}) = c"(z( Xi ) T b Q ) 

+ c ^[z( Xl ) T b + 0iZ( Xi ) T n(b - b )}z( Xl ) T n(b - b ), 

where 9{ £ (0, 1). Hence from Proposition 1, we obtain 

W(b + 0(6 - 6q)) - W(b ) = diag[c® (Z(a;;) T 6o + 0^(^0(5 - b ))Z{ Xi fn{b - b )] 

= dmg[c^(ri( Xi ))Z( Xl ) T n(b - bo)](l + o(l)) 

= i?(S). 

For simplicity, we rewrite G = G(bo, A n ,7„) and if = —H(bo,X n ,y n ). Then © can be 
written as 



b - h (i H^G + H- 1 ( ^Z T R(b)Z ) (b - 6 (1 ). (10) 



We now prove 



From (|10p . the left hand side of (|lip can be evaluated as 
iT" 1 0-Z T R(b)Z J (6 - 6o) 



if- 1 ^Z T i?(6)Z^ H~ l G + 1 (^Z T i?(b)Z^ | (6 - 6 )- 
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First we show the asymptotic order of R(b). The ith component of R(b) can be written by (jlOD 
as 



C ( 3 )(r/(^))^(^) T ^(b-bo)(l + o(l)) 

= c^^x^Zixifn Ih~ x G + H- 1 (^-Z T R(b)Z^j (b - b ) J (1 + o(l)). 

By calculating the expectation and the square root of variance of each component of G, we 
obtain with Lemma [3] 



nK™ jot; 

Therefore Lemma [2] yields that for z G [0, l] D , 



Since c ( - 3 ^(ry(a;)) is bounded near r](x) for x G [0, 1] 15 , we have with tedious but easy calculation 
that 



(vix^Zixifn ^H^G + H- 1 f^-Z T R{b)Z^j (b-b )J 



< sup 
ze[o,iP 



C ( 3 )(?7(z))Z(2) T Ih^G + F- 1 Qz T #(b)Zj (6 - bo) J 



Op 



n 



+ 



(12) 



Then Lemmas [H and [2] and (fT2j) yield 



F- 1 (^Z T R{b)Z^j H~ l G = H- 1 (±-Z T R(b)Z^ O p N 



H- l o P 



i A n ir ( 



l—m 



+ 



/v n- ti n 

n 



+ 




a; 



Op 



7? 



7 J 



Further we get with simple calculation 



This implies (jlip and completes the proof. □ 
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proof of Proposition Q] 

Barrow and Smith (1978) showed that for j = 1, • • • , D, there exists b*j G M> Kn+p such that 

sup \ Vj (z) + b^z) - B(zfb*\ = o(K-^). 

26(0,1) 

Let rj*(z) = B(z) T b*, rf{x) = Y.f=iVj( x j), Vo(x) = T,f=iVjo{ x j) and b a (x) = T,f=i b jA x j)- 
We now prove that 

b j0 -b* = o(K^ +1 h), j = l,-..,D. (13) 

Since the asymptotic order of rjjo(xj) — r]j(xj) and that of b^ — b* are the same, if (fT3j) is satisfied, 
we obtain for any Xj G (0, 1), \r]jo{ x j) ~ Vj( x j)\ = °(K n ^ p+l ^) hence Proposition 1 holds. 
From the definition of bo, we have 



1 



i=l 



log 



f(Yi\xi,r)) 



f(Yi\xi,b 



X n 



< 



1 



n ^ 



i=i 



log 



f(Yi\xi,ri) 



f(Yi\x h b*) 



X r 



(14) 



where b* = ((6j) T • • • (b* D ) T ) T . The Taylor expansion to c(r]*(xi)) around rj(xi) yields 

f(Yi\xi,ri) 



log' 



= - E [^OH^O*) - »7><)} - W^i)) - c(t,*(x0)}] 

i=l 
1 " 

= ^ E t^(^) - >f (^)}V'(r^))(l + o(l))] 

1=1 

1 n 

= ^E^(^) v, (^)( 1 + °( 1 ))] 

i=l 

= 0{K-^)). 



Therefore we obtain \rj(xi) — r]o(xi)\ = o(l), by which 



1 \ - 

n ^ 



log 



f(Yi\xi,rj) 



f(Yi\xi,bo) 



~ E [M**) ~ Vo(x l )} 2 c"(i 1 (x l ))(l + 0(1))] 



i=l 



-(t? - Zb ) T W(v - Zh 



n 



(15) 



where W = diag[c"(ry(a;j))(l + o(l))] and r\ = {r\{x\) ■ ■ ■ r)(x n )) T . It is easy to show that bo 
satisfies 



-Z T WZb = -Z T Wi] 

n n 

since 6o is the minimizer of (|15p . Further, from the definition of b*, we have 

V = Zb* -B a + o(K-(r +1 h), 



(16) 
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where B a = (6 a (cci) • • • b a (x n )) . Hence, we obtain 



-Z T WZ(b - b*) = --Z T W{B a + o(K~( p+1 h)}. 
n n 



(17) 



By noting Z = \Z\ • • • Zp], the kth component of first (K n + p) block of n 1 Z T WB a can be 
calculated as 

1 - 

(n- 1 ZfWB a ) k = -^t/ , (r,(x i ))B^ k (x il )b a (x i )(l + o(l)) 



i=l 



D 

£ 

D 



i=l 

V / C ,/ (? ? ( a; )) J B_ p+fc ( a;i )6 J> (x,)dP( a; )(l + o(l)) 



-(p+2)> 



(18) 



Here the last equality in (|18p can be obtained by mimicking the proof of Lemma 10 of Agarwal 
and Studden (1980). Similarly since the row sum of n~ l Z T W has an order 0(K~ l ), we get (|17p 
as 

-Z T WZ(b - b*) = --Z T W{B a + o(K- (p+1 h)} = o(K- (p+2) l). 
n n 

From Lemma (TJ we have n~ 1 Z T WZ = 0{K- 1 11 T ). Therefore 

b -b* = o(K-^ +1 h) 

and (I13p can be proven. □ 



To complete the proof of Theorem 1, first we will obtain (A) the asymptotic form of bp — boo- 
And then (B) we will derive the asymptotic form of E[bz> — b£)o\X n ] and V[b£>\X n ]. Similar 
argument will be applied to bj — bjo(j = !,-•• , D — 1). 



proof of Theorem Q] 

First we aim to show (A). From Lemma [H we obtain 

b-b = {-H(bo, A n , 7 n )}" 1 G(6o, A„, 7„) + R n (b) 
1 1 \ _1 

Z T W Z + -Q m (\ n ) + -^1 G(b , An, 7 n) + #n(&), 



n 



n 



n 



where 
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Let M D = n- l {Z T W Z + Q m {X n ) + 7n /) and let A jn = n-^ZjWoZj + X jn A' m A m + ln I). 



Then, Md can be written by using Md-i as 



where i? = [G\D,i, r 
(1985)), we have 



Al, 7 G\^,n 



G2,D,n 



Md-i # t 



(19) 



Gd,1,u ^0,7 
Gfl fl-l,n]' Prom the result of partitioned matrix (see, Horn and Johnson 



M 



D 



M D \ + M D \R T V~ X RM^\ -M^lFV- 1 



-V~ l RM^\ 



V 



-i 



where V = A D)1 -RM D \R T . Let G(_ D )(6 Q , A n ,7 n ) and G D (b , X n ,^ n ) be the first (£>-l)(if n + 
p)th subvector and last (K n + p)th subvector of G(bo, A rt ,7„). Then, 



b-bn = 



b (-D) - & (-D)0 

M D \ + M^R^V^RM^ -Mp\R T V- 1 

+-Rn(6), 



G(_£))(6o) A n ,7 n ) 
Go(bo, A n ,7 n ) 



from which we have 



i>D-b 0:D = y- 1 G D (bo,A„, 7n )-y- 1 i?M D i 1 G ( _ D) (6o,A„,7n) + ^ n . 
= A' 1 G D (b , X n , 7„) + u n (6 ) + #z>,n(6), 



where 



l (&o) 



-^- 1 i2Afji 1 G ( _£, ) (6o,A fl ,7n) + j^ 1 " A^} G D (b , A„, 7 n) 



and RD,n(b) is last (if n +p)th subvector of R n (b). 

In following, we shall start to show (B). The expectation of bp — &o,D can be written as 

E[b D -b 0iD \X n ] = A^E[G D (bo,X n , ln )\X n ] + E[v n (b )\X n ] +E[R D , n (b)\X n ]. 

First, E[Rj) >n (b)\X n ] = op(X n K^ l ~ m n~ 1 ) is satisfied. In the sequel, because 



E[G(b ,0,0)\X n ] 

we have with Lemma [3] 

E[G(b ,X n , 7n )\X n ] 



n 



d_ 
db 



log f(Yi\xi,b ] 



X, 



0, 



E[G(boA0)\X n ] - -Q m (X n )b - ^b 
n n 



Op 



X n K n 



n 



1 . 



21 



From Lemmas [T] and [21 on the other hand, we obtain 

y-'-^n = A^CJ - RM^R 1 A^ 1 - A D \ 

= ^RM-l^A-^I - RM~\R T A-^y 1 

= P {K- l ll T ). 
Therefore we have with straightforward calculation 
E[v n (bo)\X n ] 

= -V^RM^EiG^iboAn^n)} + {V- 1 - A^ n } E[G D (b ,\ n ,Jn)} 



(m+1) 



= °P K n~^K n " 1 + P 

= 0p f^l 

y n 

Above calculations are combined into 

E[b D -b m \X n ] = -^A- D \A' m A m b D0 -^A^\ n b D0 + o P (^~^l) 

XDn -v D {\ Dn y^ m A m b m + op ( huEtlA . 



n y n 

Here we have used the fact A^ 1 = r d(^Du)^ 1 {I + 0p(H T )) and (j n /n)Ti)(Xp) n )^ 1 b£)Q = 
o{\ n Kl l ~ m n~~ l l) . Hence, we finally obtain 

E[f, Da {x D ) - rj Dt o{x)\X n ] = E[B{x D f{b D - b D0 )\X n ] 

= bo^x^ + opiXnK^n- 1 ). 

This implies that the first assertion of Theorem [TJ The variance of tIz) ~(xd) = B(xp)) T b£) can 
be written as 

V[f, Dri {x D )\X n ) = B(^ D ) T A-; 7 y[G D (6o,A n ,7n)|X n ]A D ; 7 B(x D )(l + o P (l)). 

since it is easy to find that the conditional variance of v n (bo) can be shown to be op{K n /n). By 
noting 

V[G D {b Q ,\ n , ln )\X n ] = \z T D V[y\X n ]Z D = \zlWZ D = —Gp> + op((K n /n)~ 1 ll T ), 
we have the second assertion as 

V[fj D ^(x D )\X n ] 

±B(x D ) T A^G D A^B(x D )(l + o P (l)) 
1 



s(x D ) T r D (A Dn )" 1 r D (o)r D (A Dn )- 1 s(x D )(i + 0p (i)). 

n 



22 



Also it is easily confirmed by straightforward calculation with Lemma [T] that for j ^ k 
Covifj^M^k)) = -BixjfAji (-ZjWZ k ) A^B(x k )(l + o P (l)) = O p ( 



n ■"' \ n 

this completely the proof. □ 



proof of Theorem [3] 

Let bp = [b\p ■ ■ ■ b D p] T and let b = (b 1 • • • bp,) T be the maximizer of 



G(b,Z u ,%) = ~{y T (Zb) - l T c(Zb)} + -l T h(y) - —b T Q p+1 (X u )b - ^b T b 
n n in In 

with respect to (frf • • • b^)) T ', where 7 n = o{K%T /a?) and let fjj(xj) = B(xj) T bj(j = 1, • • • ,£)). 
Then the asymptotic normality of [771(^1) • • • jji»(xD)] T can be obtained by the same manner 
to the proof of Theorem [2] with m = p + 1. Similar to Lemma HI bp can be written as 

bp-b = (Z T WZ + Q p+l {T, u ) + ^ n I)- l G(bp^ u ,%)+r n 
= ^ in {Z T WZ + Q p ^ u ) + ^ n I)- 1 b P + r n 



where W = diag[c" \Z bp)] and r n = op(y / K n /nl) is the remainder. Then Lemma [2] yields 
bp - b = Op^nKnU" 1 !) = o P (y/ Kn/nl), by which fj j>P (xj) - fjj(xj) = o P {^K n /n)(j = 
1, • • • , D). This leads to Theorem [3j □ 
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